Speaker Diarization Using Unsupervised Discriminant Analysis of Inter-channel Delay Feature
نویسندگان
چکیده
When multiple microphones are available estimates of inter-channel delay, which characterise a speaker’s location, can be used as features for speaker diarization. Background noise and reverberation can, however, lead to noisy features and poor performance. To ameliorate these problems, this paper presents a new approach to the discriminant analysis of delay features for speaker diarization. This novel and nonetheless unsupervised approach aims to increase speaker separability in delay-space. We assess the approach on subsets of four standard NIST RT datasets and demonstrate a relative improvement in diarization error rate of 25% on a separate evaluation set using delay features alone.
منابع مشابه
Speaker diarization of spontaneous meeting room conversations
Speaker diarization is the task of identifying “who spoke when” in an audio stream containing multiple speakers. This is an unsupervised task as there is no a priori information about the speakers. Diagnostical studies on state-of-the-art diarization systems have isolated three main issues with the systems; overlapping speech, effects of background noise and speech/nonspeech detection errors on...
متن کاملALIZE/spkdet: a state-of-the-art open source software for speaker recognition
This paper presents the ALIZE/SpkDet open source software packages for text independent speaker recognition. This software is based on the well-known UBM/GMM approach. It includes also the latest speaker recognition developments such as Latent Factor Analysis (LFA) and unsupervised adaptation. Discriminant classifiers such as SVM supervectors are also provided, linked with the Nuisance Attribut...
متن کاملDomain Adaptation of PLDA Models in Broadcast Diarization by Means of Unsupervised Speaker Clustering
This work presents a new strategy to perform diarization dealing with high variability data, such as multimedia information in broadcast. This variability is highly noticeable among domains (inter-domain variability among chapters, shows, genres, etc.). Therefore, each domain requires its own specific model to obtain the optimal results. We propose to adapt the PLDA models of our diarization sy...
متن کاملSpeaker Diarization Using Gaussian Mixture Turns and Segment Matching
Speaker diarization aims to detect “who spoke when” in large audio segments. It is an important task in processing of broadcast news audio, making easier the audio segments selection and indexing task. In this paper an unsupervised speaker diarization scheme is proposed using a Gaussian Mixture Model as a Universal Background Model, Bayesian Information Criterion and fingerprint detection. A de...
متن کاملSpeaker Diarization Based on Gmm Supervectors and Unsupervised Intra-speaker Variability Modeling
This paper presents a novel framework for speaker diarization. Audio is parameterized by a sequence of GMM-supervectors representing overlapping short segments of speech. Session dependent intra-session intra-speaker variability is estimated online in an unsupervised manner, and is removed from the supervectors using Nuisance Attribute Projection (NAP) The supervectors are then projected using ...
متن کامل